📦 Importing Libraries¶
In [59]:
# import the pandas library
import pandas as pd
📁 Load Dataset¶
In [101]:
# read in the csv file as a pandas dataframe
bikes = pd.read_csv(r"C:\Users\Maged\Desktop\Self-Projects\Python Tableau\LondonBikeRides-main\london_merged.csv")
🔍 Exploratory Data Analysis¶
In [103]:
bikes.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 17414 entries, 0 to 17413 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 timestamp 17414 non-null object 1 cnt 17414 non-null int64 2 t1 17414 non-null float64 3 t2 17414 non-null float64 4 hum 17414 non-null float64 5 wind_speed 17414 non-null float64 6 weather_code 17414 non-null float64 7 is_holiday 17414 non-null float64 8 is_weekend 17414 non-null float64 9 season 17414 non-null float64 dtypes: float64(8), int64(1), object(1) memory usage: 1.3+ MB
In [105]:
bikes.shape
Out[105]:
(17414, 10)
In [107]:
bikes
Out[107]:
In [109]:
# count the unique values in the weather_code column
bikes.weather_code.value_counts()
Out[109]:
weather_code 1.0 6150 2.0 4034 3.0 3551 7.0 2141 4.0 1464 26.0 60 10.0 14 Name: count, dtype: int64
In [111]:
# count the unique values in the season column
bikes.season.value_counts()
Out[111]:
season 0.0 4394 1.0 4387 3.0 4330 2.0 4303 Name: count, dtype: int64
🧹Data Cleaning¶
In [113]:
# specifying the column names that I want to use
new_cols_dict ={
'timestamp':'time',
'cnt':'count',
't1':'temp_real_C',
't2':'temp_feels_like_C',
'hum':'humidity_percent',
'wind_speed':'wind_speed_kph',
'weather_code':'weather',
'is_holiday':'is_holiday',
'is_weekend':'is_weekend',
'season':'season'
}
# Renaming the columns to the specified column names
bikes.rename(new_cols_dict, axis=1, inplace=True)
In [115]:
# changing the humidity values to percentage (i.e. a value between 0 and 1)
bikes.humidity_percent = bikes.humidity_percent / 100
In [117]:
# creating a season dictionary so that we can map the integers 0-3 to the actual written values
season_dict = {
'0.0':'spring',
'1.0':'summer',
'2.0':'autumn',
'3.0':'winter'
}
# creating a weather dictionary so that we can map the integers to the actual written values
weather_dict = {
'1.0':'Clear',
'2.0':'Scattered clouds',
'3.0':'Broken clouds',
'4.0':'Cloudy',
'7.0':'Rain',
'10.0':'Rain with thunderstorm',
'26.0':'Snowfall'
}
# changing the seasons column data type to string
bikes.season = bikes.season.astype('str')
# mapping the values 0-3 to the actual written seasons
bikes.season = bikes.season.map(season_dict)
# changing the weather column data type to string
bikes.weather = bikes.weather.astype('str')
# mapping the values to the actual written weathers
bikes.weather = bikes.weather.map(weather_dict)
In [119]:
# checking our dataframe to see if the mappings have worked
bikes.head()
Out[119]:
🔄 Export Cleaned Data for Tableau¶
In [121]:
# writing the final dataframe to an excel file that we will use in our Tableau visualisations. The file will be the 'london_bikes_final.xlsx' file and the sheet name is 'Data'
bikes.to_excel(r'C:\Users\Maged\Desktop\Self-Projects\Python Tableau\LondonBikeRides-main\london_bikes_final.xlsx', sheet_name='Data')